{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "caa5f2d5-28bb-4ce9-8a11-92646b3a9f6c",
   "metadata": {},
   "source": [
    "<!---\n",
    "  Licensed to the Apache Software Foundation (ASF) under one\n",
    "  or more contributor license agreements.  See the NOTICE file\n",
    "  distributed with this work for additional information\n",
    "  regarding copyright ownership.  The ASF licenses this file\n",
    "  to you under the Apache License, Version 2.0 (the\n",
    "  \"License\"); you may not use this file except in compliance\n",
    "  with the License.  You may obtain a copy of the License at\n",
    "\n",
    "    http://www.apache.org/licenses/LICENSE-2.0\n",
    "\n",
    "  Unless required by applicable law or agreed to in writing,\n",
    "  software distributed under the License is distributed on an\n",
    "  \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
    "  KIND, either express or implied.  See the License for the\n",
    "  specific language governing permissions and limitations\n",
    "  under the License.\n",
    "-->\n",
    "\n",
    "# GeoPandas Interoperability\n",
    "\n",
    "> Note: Before running this notebook, ensure that you have installed SedonaDB: `pip install \"apache-sedona[db]\"`\n",
    "\n",
    "This notebook shows how to leverage GeoPandas with SedonaDB for large-scale geospatial analysis.\n",
    "\n",
    "You'll learn how to:\n",
    "\n",
    "- Read common geospatial file formats like GeoJSON and FlatGeobuf into a GeoPandas GeoDataFrame\n",
    "- Convert these data from these input formats into a SedonaDB DataFrame for large-scale analysis.\n",
    "\n",
    "Any file type that can be read by GeoPandas can also be read into a SedonaDB DataFrame!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "0434bead-2628-4844-a3f6-2f9c15a21899",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sedona.db\n",
    "import geopandas as gpd\n",
    "\n",
    "sd = sedona.db.connect()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "618b5d1d-ac2b-4786-ae5b-0d10efd6a8d4",
   "metadata": {},
   "source": [
    "### Read a GeoJSON file with GeoPandas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "2691bd24-9b2d-4cf9-958d-4ef01d967cb3",
   "metadata": {},
   "outputs": [],
   "source": [
    "gdf = gpd.read_file(\"sample_geometries.json\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "cd367a73-acd3-41cf-b892-7d863c370d5f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>prop0</th>\n",
       "      <th>prop1</th>\n",
       "      <th>geometry</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>value0</td>\n",
       "      <td>None</td>\n",
       "      <td>POINT (102 0.5)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>value1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>LINESTRING (102 0, 103 1, 104 0, 105 1)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>value2</td>\n",
       "      <td>{ \"this\": \"that\" }</td>\n",
       "      <td>POLYGON ((100 0, 101 0, 101 1, 100 1, 100 0))</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    prop0               prop1                                       geometry\n",
       "0  value0                None                                POINT (102 0.5)\n",
       "1  value1                 0.0        LINESTRING (102 0, 103 1, 104 0, 105 1)\n",
       "2  value2  { \"this\": \"that\" }  POLYGON ((100 0, 101 0, 101 1, 100 1, 100 0))"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "454e08a3-de65-4151-9d29-5d5ee8cf31d3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'geopandas.geodataframe.GeoDataFrame'>\n",
      "RangeIndex: 3 entries, 0 to 2\n",
      "Data columns (total 3 columns):\n",
      " #   Column    Non-Null Count  Dtype   \n",
      "---  ------    --------------  -----   \n",
      " 0   prop0     3 non-null      object  \n",
      " 1   prop1     2 non-null      object  \n",
      " 2   geometry  3 non-null      geometry\n",
      "dtypes: geometry(1), object(2)\n",
      "memory usage: 204.0+ bytes\n"
     ]
    }
   ],
   "source": [
    "gdf.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5837268-1620-4b2b-bf37-cb6e282daedf",
   "metadata": {},
   "source": [
    "### Convert the GeoPandas DataFrame to a SedonaDB DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "385f6333-411d-4d1f-a09b-8816cccceabc",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = sd.create_data_frame(gdf)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "186059ae-4cf8-48ec-878a-71e7a39ac07e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "┌────────┬────────────────────┬──────────────────────────────────────────┐\n",
      "│  prop0 ┆        prop1       ┆                 geometry                 │\n",
      "│  utf8  ┆        utf8        ┆                 geometry                 │\n",
      "╞════════╪════════════════════╪══════════════════════════════════════════╡\n",
      "│ value0 ┆                    ┆ POINT(102 0.5)                           │\n",
      "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
      "│ value1 ┆ 0.0                ┆ LINESTRING(102 0,103 1,104 0,105 1)      │\n",
      "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
      "│ value2 ┆ { \"this\": \"that\" } ┆ POLYGON((100 0,101 0,101 1,100 1,100 0)) │\n",
      "└────────┴────────────────────┴──────────────────────────────────────────┘\n"
     ]
    }
   ],
   "source": [
    "df.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f09cbbe-86b5-4eb4-b920-1b12f018d1a6",
   "metadata": {},
   "source": [
    "## Read and Convert Data From a FlatGeobuf file\n",
    "\n",
    "This code demonstrates how to read a FlatGeobuf file with GeoPandas and then convert it to a SedonaDB DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "965ae9f3-293b-4e8e-92bf-1359a482bca3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read a FlatGeobuf file with GeoPandas\n",
    "path = \"https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_cities.fgb\"\n",
    "gdf = gpd.read_file(path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "372c937f-da36-4e4b-98da-347890318a80",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Convert the GeoPandas DataFrame to a SedonaDB DataFrame\n",
    "df = sd.create_data_frame(gdf)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "d99f4474-da3a-4834-9675-184a667b2a90",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "┌──────────────┬──────────────────────────────┐\n",
      "│     name     ┆           geometry           │\n",
      "│     utf8     ┆           geometry           │\n",
      "╞══════════════╪══════════════════════════════╡\n",
      "│ Vatican City ┆ POINT(12.4533865 41.9032822) │\n",
      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
      "│ San Marino   ┆ POINT(12.4417702 43.9360958) │\n",
      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
      "│ Vaduz        ┆ POINT(9.5166695 47.1337238)  │\n",
      "└──────────────┴──────────────────────────────┘\n"
     ]
    }
   ],
   "source": [
    "df.show(3)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv (3.13.3)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
